Train Test Validation Suite

The suite is composed of various checks such as: Property Drift, Label Drift, Text Embeddings Drift, etc...
Each check may contain conditions (which will result in pass / fail / warning ! / error ) as well as other outputs such as plots or tables.
Suites, checks and conditions can all be modified. Read more about custom suites.


Conditions Summary

Status Check Condition More Info
Property Drift categorical drift score < 0.2 and numerical drift score < 0.2 Passed for 11 columns out of 11 columns. Found column "Language" has the highest categorical drift score: 0 Found column "Average Word Length" has the highest numerical drift score: 0.04
Label Drift Label drift score < 0.15 Label's drift score Cramer's V is 0
Train Test Samples Mix Percentage of test data samples that appear in train data is less or equal to 5% Percent of test data samples that appear in train data: 3.49%

Check With Conditions Output

Property Drift

Calculate drift between train dataset and test dataset per feature, using statistical measures. Read More...

Conditions Summary
Status Condition More Info
categorical drift score < 0.2 and numerical drift score < 0.2 Passed for 11 columns out of 11 columns. Found column "Language" has the highest categorical drift score: 0 Found column "Average Word Length" has the highest numerical drift score: 0.04
Additional Outputs
The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the properties, sorted by drift score and showing only the top 5 properties, according to drift score.
For discrete distribution plots, showing the top 10 categories with largest difference between train and test.
The following columns do not have enough samples to calculate drift score: ['Subjectivity', 'Sentiment', 'Reading Ease']

Go to top

Label Drift

Calculate label drift between train dataset and test dataset, using statistical measures. Read More...

Conditions Summary
Status Condition More Info
Label drift score < 0.15 Label's drift score Cramer's V is 0
Additional Outputs
The Drift score is a measure for the difference between two distributions, in this check - the test and train distributions.
The check shows the drift score and distributions for the label.
For discrete distribution plots, showing the top 10 categories with largest difference between train and test.

Go to top

Train Test Samples Mix

Detect samples in the test data that appear also in training data. Read More...

Conditions Summary
Status Condition More Info
Percentage of test data samples that appear in train data is less or equal to 5% Percent of test data samples that appear in train data: 3.49%
Additional Outputs
3.49% (18 / 516) of test data samples also appear in train data
    Test Text Sample Number of Test Duplicates
Train Sample IDs Test Sample IDs    
1249 323 عاش قيس سعيد غصبن علي قناه الج... 1
578, 1468, 1755 408 كلنا قيس السعيد 1
664, 1202, 1853 242 مجنون يحكم دوله 1
72, 339, 483, 573, 669, 1093, 1206, 2216... 260, 453 يحيا قيس سعيد 2
249, 571, 1729, 2243, 2370 472 يسقط الانقلاب 1
2047 43 تحيا لقيس سعيد 1
212, 1388, 2121, 2238 285, 312 المرزوقي يشرف كل تونسي حر ويكف... 2
1331 111 قيسون يا معذبهم 1
502, 859 338 اللهم انصر قيس سعيد علي شياطين 1
277, 1763, 2030 282 كلنا قيسون 1

Go to top

Check Without Conditions Output


Other Checks That Weren't Displayed

Check Reason
Text Embeddings Drift Functionality requires embeddings, but the the TextData object had none. To use this functionality, use the set_embeddings method to set your own embeddings with a numpy.array or use TextData.calculate_builtin_embeddings to add the default deepchecks embeddings.

Go to top